Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

livecheck: add GithubLatest strategy #9381

Merged
merged 6 commits into from Dec 7, 2020
Merged

livecheck: add GithubLatest strategy #9381

merged 6 commits into from Dec 7, 2020

Conversation

nandahkrishna
Copy link
Member

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew tests with your changes locally?
  • Have you successfully run brew man locally and committed any changes?

This PR adds a GithubLatest strategy for Livecheck, to check the latest release marked on a project's GitHub repository.

If my greping was right, there are about 214 Formulae with livecheck blocks to check the latest release, so this strategy would reduce replication of standard URL formats and regexes across multiple Formulae.

There aren't too many exceptions to the default regex (eg: when the tag name contains the Formula name, non-version characters or version numbers separated by - or _) – I was unsure whether to factor in those cases as well.

@BrewTestBot
Copy link
Member

Review period will end on 2020-12-03 at 12:59:20 UTC.

@BrewTestBot BrewTestBot added the waiting for feedback Merging is blocked until sufficient time has passed for review label Dec 2, 2020
@samford
Copy link
Member

samford commented Dec 2, 2020

You must have picked up my brainwaves, as this was the first PR I was going to create after finishing the docs PR, ahah. I've had this strategy implemented locally for months but I didn't have the opportunity to sneak it in during GSoC (and post-GSoC I promised myself to the docs before working on notable changes to livecheck myself). This strategy's definitely needed (and will save us from replicating similar livecheck blocks over and over), so I'll add a review in a little bit to address a few areas.

@nandahkrishna
Copy link
Member Author

Haha, looking forward to your review. I wanted to add this strategy earlier but got a little too occupied with other stuff 😅

@maxim-belkin
Copy link
Member

I've had this strategy implemented locally for months

so I'll add a review in a little bit

🙈

Copy link
Member

@maxim-belkin maxim-belkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a few (minor) comments below (above?), 👍 from me.

Copy link
Member

@samford samford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another pass through this later (specifically the documentation comments) but this should help to move things along.

Copy link
Member

@samford samford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of the areas where I could add suggestions above, we should skip #preprocess_url for the GithubLatest strategy as well (in livecheck/livecheck.rb#latest_version). Currently, the URL is processed into the .git URL and then the GithubLatest strategy uses that to create the "latest" URL, so the intermediate preprocessing step is simply a waste of effort.

I can push a commit after the suggested changes have been incorporated but here's a patch that addresses this:

diff --git a/Library/Homebrew/livecheck/livecheck.rb b/Library/Homebrew/livecheck/livecheck.rb
index 953126526055eacbacced7e17923718b60509b35..07cba1f076aac36891b0caf166b807d40db532ed 100644
--- a/Library/Homebrew/livecheck/livecheck.rb
+++ b/Library/Homebrew/livecheck/livecheck.rb
@@ -25,6 +25,11 @@ module Homebrew
       lolg.it
     ].freeze
 
+    STRATEGY_SYMBOLS_TO_SKIP_PREPROCESS_URL = [
+      :github_latest,
+      :page_match,
+    ].freeze
+
     UNSTABLE_VERSION_KEYWORDS = %w[
       alpha
       beta
@@ -381,8 +386,8 @@ module Homebrew
           next
         end
 
-        # Do not preprocess the URL when livecheck.strategy is set to :page_match
-        url = if livecheck_strategy == :page_match
+        # Only preprocess the URL when it's appropriate
+        url = if STRATEGY_SYMBOLS_TO_SKIP_PREPROCESS_URL.include?(livecheck_strategy)
           original_url
         else
           preprocess_url(original_url)

I'm using an array for the symbols because brew style will complain if we do url = if livecheck_strategy == :page_match || livecheck_strategy == :github_latest (i.e., "Avoid comparing a variable with multiple items in a conditional, use Array#include? instead.").

Library/Homebrew/livecheck/strategy/github_latest.rb Outdated Show resolved Hide resolved
@samford
Copy link
Member

samford commented Dec 2, 2020

For what it's worth, I'm working on updating the formulae in homebrew/core that currently use a GitHub "latest" URL in the livecheck block. I wrote a script that updates the related formulae and I'll create a draft PR once I'm done vetting the changes and addressing issues.

Edit: The homebrew/core PR can be found at Homebrew/homebrew-core#66134. I've labeled it "do not merge" until we finish and merge this PR. I already tested these changes (using this PR and all the changes mentioned here), identified any issues, and either addressed them in that PR or through suggested changes here.

@BrewTestBot
Copy link
Member

Review period ended.

@BrewTestBot BrewTestBot removed the waiting for feedback Merging is blocked until sufficient time has passed for review label Dec 3, 2020
@nandahkrishna
Copy link
Member Author

@samford, I've incorporated the suggestions from your review, except the patches. Please do push any other commit(s) you have, thanks!

@samford
Copy link
Member

samford commented Dec 6, 2020

I pushed some commits to address anything else that was left. Namely:

  • Updated a URL in the tests to align with the example in the related GithubLatest comment (per my previous suggestion).
  • Committed my patch to skip #preprocess_url when strategy :github_latest is found in a livecheck block (to avoid unnecessarily converting the URL to the Git URL then into the "latest" URL).
  • Committed my patch to handle strategies with a PRIORITY of 0 other than PageMatch (while still properly handling PageMatch). This reworked version should be a little easier to extend in the future, too.
  • Reworked the GithubLatest documentation comments a bit.

I did full before/after livecheck runs across homebrew/core with the current version of this PR and the related homebrew/core PR that updates livecheck blocks in related formulae (Homebrew/homebrew-core#66134) and I didn't see any unexpected differences in the output.

Everything seems to be working properly from a functional standpoint now, so this just needs some final review to identify any areas that should be addressed before merging.

Copy link
Member

@maxim-belkin maxim-belkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍
I think this can be merged as is or improved a tiny bit following some or all of my comments below.

Comment on lines +389 to +390
# Only preprocess the URL when it's appropriate
url = if STRATEGY_SYMBOLS_TO_SKIP_PREPROCESS_URL.include?(livecheck_strategy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess_url function (still) has the following bit:

if host.end_with?("github.com")
return url if path.match? %r{/releases/latest/?$}
owner, repo = path.delete_prefix("downloads/").split("/")
url = "#{scheme}://#{host}/#{owner}/#{repo}.git"
elsif host.end_with?(*GITEA_INSTANCES)

Shall we remove or make changes to it?

Copy link
Member

@samford samford Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that /releases/latest URLs shouldn't ever be processed and keeping this guard in place ensures this continues to be true regardless of whether #preprocess_url is skipped for some strategies.

Removing this guard would mess up checks for any formulae that use a /releases/latest URL in a livecheck block (i.e., not updated to use strategy :github_latest). PageMatch checks only skip preprocessing when strategy :page_match is present in the livecheck block (i.e., checks that implicitly use PageMatch don't skip preprocessing). Homebrew/homebrew-core#66134 will address this for homebrew/core but other taps could be affected if we removed this.

There isn't any notable benefit to removing this simple guard but there are clear detriments under certain circumstances. With that in mind, I think we should simply keep this guard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - thank you! Perhaps it would be helpful to add a comment to that block above explaining which cases it catches (formulae in non-homebrew/core taps + <some conditions> ) -- otherwise it's not obvious why we still check for github.com.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not the target audience, do you think something short like # Handles any `livecheck` blocks not using the `GithubLatest` strategy would be sufficient? This guard would also apply if a check that uses a /releases/latest URL mistakenly sneaks into homebrew/core, so I don't think we need to go into the details of when we're most likely to encounter this situation (e.g., non-core taps).

# A priority of zero causes livecheck to skip the strategy. We do this
# for {GithubLatest} so we can selectively apply the strategy using
# `strategy :github_latest` in a `livecheck` block.
PRIORITY = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, 0 priority was equivalent (specific) to the PageMatch strategy.
Now, we introduce another strategy with the same priority. This makes the way we write conditionals that process these priorities important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea was that PRIORITY = 0 would cause a strategy to be omitted unless it's specified in a livecheck block (e.g., strategy :github_latest). I think the updated logic in Livecheck::Strategy#from_url more accurately represents this, as it now ensures the strategy can be applied when specified in a livecheck block.

Since PageMatch was the only strategy with a priority of zero up to this point and it was special-cased, I forgot to account for the from_symbol(livecheck_strategy) != strategy condition originally. Outside of this omission, we haven't treated PRIORITY = 0 as being equivalent to strategy == PageMatch in livecheck's code and we specifically reference PageMatch (or :page_match) where appropriate.

PageMatch is an exception to the above idea, rather than the rule. The PageMatch strategy will automatically apply if a livecheck block contains a url and regex, so it's not omitted like other zero-priority strategies. The reasons why it has a priority of zero is to be able to enforce the regex requirement before applying it and to also deprioritize it with respect to more specific strategies (e.g., those with the DEFAULT_PRIORITY of 5 or a higher PRIORITY).


I've thought about updating the #match? method in strategies to allow for contextual information to be passed in. I imagine this would either be the Livecheck object (i.e., the information in the formula's livecheck block) or maybe an optional options hash (to allow for more flexibility/extensibility in the future). With this setup, we could give PageMatch a priority of 1 (deprioritized but not omitted) and enforce the regex requirement in PageMatch#match?.

I believe this is a better setup overall, as we wouldn't violate the PRIORITY = 0 omission idea above and also it would allow us to keep this information in strategies instead of Strategy#from_url. I'll create a PR for this after I've finished the livecheck docs PR (and I allow myself to work on livecheck's internals again) and we can discuss it further there.


That said, I believe the related changes in this PR are sufficient for now but let me know if I've overlooked anything important.

!strategy::PRIORITY.positive? &&
from_symbol(livecheck_strategy) != strategy
# Ignore strategies with a priority of 0 or lower, unless the
# strategy is specified in the `livecheck` block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly state on line 17 of this file that strategies with priorities of 0 or lower are ignored.
Shall we expand and say that "negative priorities raise an error (they don't at the moment, but they probably should), 0 priority is reserved for..., all normal strategies should use positive values below 10" ?

Why do we allow such strategies when they're specified in the livecheck block? I guess this comment should explain why we're doing this instead of or in addition to what it currently says.

Copy link
Member

@samford samford Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment on line 15 mentions, the 0-10 range is informal. It's technically possible for us to go above or below this range if it becomes necessary in the future and that's why we don't strictly enforce the range in the code.

The only ideas that we strictly enforce in the code are:

  • The default priority is 5
  • Strategies with a non-positive priority (0 or below) are omitted unless specified in the livecheck block
  • Strategies that apply to a URL are arranged from highest priority to lowest

I do think this explanation can be improved a little (e.g., I didn't mention here that the way to apply a strategy with a non-positive priority is using strategy in a livecheck block), so I'll take care of that in a moment.

# @param regex [Regexp] a regex used for matching versions in content
# @return [Hash]
def self.find_versions(url, regex = nil)
%r{github\.com/(?:downloads/)?(?<username>[^/]+)/(?<repository>[^/]+)}i =~ url.sub(/\.git$/i, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked Homebrew/core, and the following packages use /downloads/ subfolder:

  • apktool.rb
  • btpd.rb
  • cssembed.rb
  • growly.rb
  • henplus.rb
  • jadx.rb
  • midgard2.rb
  • rpg.rb
  • syck.rb
  • tidyp.rb
  • wrk-trello.rb

My understanding is that /downloads folders have been deprecated for a while now -- see https://github.blog/2012-12-12-goodbye-uploads/. I wonder if we should remove downloads from here and update the above formulae to use proper download paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already been through the /downloads/ formulae recently (as part of another PR) and I took care of the straightforward ones, so this is what remains. As you would guess, most of these are highly outdated and tend to have low install counts in our analytics. The repository uploads URLs continue to resolve, so updating these is low priority in comparison to work that can have a greater impact.

Can potentially be updated

  • apktool.rb: The /downloads/ URL is a resource from another GitHub repository that's used in the test block but I didn't see an alternative location for this particular file. The formula could be updated to use a different apk if a suitable replacement can be found.
  • btpd.rb: The formula would need to be updated to build using the 0.16 tag archive.
  • jadx.rb: This uses the same apk file as the apktool formula, so the situation is the same.
  • midgard2.rb: The formula would need to be updated to build using the 12.09.1 tag archive.
  • rpg.rb: The formula would need to be updated to build using the 0.3.0 tag archive. This release is from 2010, the repository is archived (I deprecated the formula in the past), and there have only been 12 installs over the past year, so it may not be worth the effort.
  • tidyp.rb: The formula would need to be updated to build using the 1.04 tag archive.

Unlikely that we can update

  • cssembed.rb: stable is a jar file and I don't see alternatives. The GitHub repository is archived (I deprecated the formula in the past), so it's not worth the effort.
  • growly.rb: The repository doesn't have any tags or releases (and we're not going to use the growly file from master). Besides that, the code hasn't been modified in nine years and the repository doesn't have a license file (missing license file for the repo ryankee/growly#7).
  • henplus.rb: The repository doesn't have any tags or releases and the formula is deprecated since it doesn't build. There's been some activity here and there in recent years but we're stuck unless they create a release.
  • syck.rb: This is one of the two formulae using a github.s3.amazonaws.com/downloads URL. There aren't any alternatives (to my knowledge) and it hasn't been updated in years (it's a project by why the lucky stiff and I don't imagine him coming back anytime soon).
  • wrk-trello.rb: This is one of the two formulae using a github.s3.amazonaws.com/downloads URL. There aren't any alternatives (to my knowledge) and it hasn't been updated in eight years.

@samford
Copy link
Member

samford commented Dec 6, 2020

Thinking about this some more, do we need to add some RuboCops for this strategy (in the rubocops/livecheck.rb file)? For example:

  • Suggest use of strategy :github_latest if a /releases/latest URL is used
  • If strategy :github_latest is used with a /releases/latest URL...
    • Suggest url :stable if stable is a GitHub URL
    • Suggest url :head if :stable isn't a GitHub URL but head is
    • Suggest the GitHub .git URL for the repository if neither :stable nor :head are sufficient
  • Suggest that the regex be removed if it's identical to the strategy's default regex

The second one here could potentially be implemented more like "if strategy :github_latest is used and url isn't :stable, :head, or the GitHub .git URL...", as that would catch other GitHub URLs than /releases/latest.

@nandahkrishna
Copy link
Member Author

The rubocops would be useful, in my opinion. Shall I go ahead and add them (unless you already have them, that is)?

@samford
Copy link
Member

samford commented Dec 6, 2020

Shall I go ahead and add them (unless you already have them, that is)?

You're more familiar with the livecheck RuboCops (and how to create new ones), so please feel free.

@MikeMcQuaid MikeMcQuaid merged commit ed7df8f into Homebrew:master Dec 7, 2020
@MikeMcQuaid
Copy link
Member

Thanks again @nandahkrishna!

@nandahkrishna nandahkrishna deleted the github-latest branch December 7, 2020 13:02
@nandahkrishna
Copy link
Member Author

Thanks everyone. I'm working on a new PR for the RuboCops, I hope to have it out soon with more details.

@BrewTestBot BrewTestBot added the outdated PR was locked due to age label Jan 7, 2021
@Homebrew Homebrew locked as resolved and limited conversation to collaborators Jan 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants